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Abstract — The fundamental principle underlying compressed 
sensing is that a signal, which is sparse under some basis 
representation, can be recovered from a small number of linear 
measurements. However, prior knowledge of the sparsity basis 
is essential for the recovery process. This work introduces the 
concept of blind compressed sensing, which avoids the need to 
know the sparsity basis in both the sampling and the recovery 
process. We suggest three possible constraints on the sparsity 
basis that can be added to the problem in order to make its 
solution unique. For each constraint we prove conditions for 
uniqueness, and suggest a simple method to retrieve the solution. 
Under the uniqueness conditions, and as long as the signals 
are sparse enough, we demonstrate through simulations that 
without knowing the sparsity basis our methods can achieve 
results similar to those of standard compressed sensing, which 
rely on prior knowledge of the sparsity basis. This offers a 
general sampling and reconstruction system that fits all sparse 
signals, regardless of the sparsity basis, under the conditions and 
constraints presented in this work. 



I. Introduction 

Sparse signal representations have gained popularity in 
recent years in many theoretical and applied areas Hi-El . 
Roughly speaking, the information content of a sparse signal 
occupies only a small portion of its ambient dimension. For 
example, a finite dimensional vector is sparse if it contains a 
small number of nonzero entries. It is sparse under a basis if 
its representation under a given basis transform is sparse. An 
analog signal is referred to as sparse if, for example, a large 
part of its bandwidth is not exploited 0, (7). Other models 
for analog sparsity are discussed in detail in J5), 0, ©■ 

Compressed sensing (CS) Q, focuses on the role of 
sparsity in reducing the number of measurements needed to 
represent a finite dimensional vector x £ W n . The vector x is 
measured by b = Ax, where A is a matrix of size n x m, with 
n <C m. In this formulation, determining x from the given 
measurements b is ill possed in general, since A has fewer 
rows than columns and is therefore non-invertible. However, 
if x is known to be sparse in a given basis P, then under 
additional mild conditions on A l9ll- lfTT1l . the measurements 
b determine x uniquely as long as n is large enough. This 
concept was also recently expanded to include sub-Nyquist 
sampling of structured analog signals |4), (6), lfl2l . 

In principle, recovery from compressed measurements is 
NP-hard. Nonetheless, many suboptimal methods have been 
proposed to approximate its solution Hl-O, |[T3ll - |fT31 . These 
algorithms recover the true value of x when x is sufficiently 
sparse and the columns of A are incoherent (TJ, l9l- lfTT1l . 
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|fl3ll . However, all known recovery approaches use the prior 
knowledge of the sparsity basis P. 

Dictionary learning (DL) |[T6l - ll20l is another application of 
sparse representations. In DL, we are given a set of training 
signals, formally the columns of a matrix X. The goal is to 
find a dictionary P, such that the columns of X are sparsely 
represented as linear combinations of the columns of P. In 
ifTTl . the authors study conditions under which the DL problem 
yields a unique solution for the given training set X. 

In this work we introduce the concept of blind compressed 
sensing (BCS), in which the goal is to recover a high- 
dimensional vector x from a small number of measurements, 
where the only prior is that there exists some basis in which 
x is sparse. We refer to our setting as blind, since we do not 
require knowledge of the sparsity basis for the sampling or 
the reconstruction. This is in sharp contrast to CS, in which 
recovery necessitates this knowledge. Our BCS framework 
combines elements from both CS and DL. On the one hand, as 
in CS and in contrast to DL, we obtain only low dimensional 
measurements of the signal. On the other hand, we do not 
require prior knowledge of the sparsity basis which is similar 
to the DL problem. The goal of this work is to investigate the 
basic conditions under which blind recovery from compressed 
measurements is possible theoretically, and to propose con- 
crete algorithms for this task. 

Since the sparsity basis is unknown, the uncertainty about 
the signal x is larger in BCS than in CS. A straightforward 
solution would be to increase the number of measurements. 
However, we show that no rate increase can be used to 
determine x, unless the number of measurements is equal 
the dimension of x. Furthermore, we prove that even if we 
have multiple signals that share the same (unknown) sparsity 
basis, as in DL, BCS remains ill-posed. In order for the 
measurements to determine x uniquely we need an additional 
constraint on the problem. To prove the concept of BCS we 
begin by discussing two simple constraints on the sparsity ba- 
sis, which enable blind recovery of a single vector x. We then 
turn to our main contribution, which is a BCS framework for 
structured sparsity bases. In this setting, we show that multiple 
vectors sharing the same sparsity pattern are needed to ensure 
recovery. For all of the above formulations we demonstrate via 
simulations that when the signals are sufficiently sparse the 
results of our BCS methods are similar to those obtained by 
standard CS algorithms which use the true, though unknown 
in practice, sparsity basis. When relying on the structural 
constraint we require in addition that the number of signals 
must be large enough. However, the simulations show that the 
number of signals needed is reasonable and much smaller than 
that used for DL l2ll-l24l. 

The first constraint on the basis we consider relies on the 
fact that over the years there have been several bases that 
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have been considered "good" in the sense that they are known 
to sparsely represent many natural signals. These include, 
for example, various wavelet representations l25l and the 
discrete-cosine transform (DCT) (26] . We therefore treat the 
setting in which the unknown basis P is one of a finite 
and known set of bases. We develop uniqueness conditions 
and a recovery algorithm by treating this formulation as a 
series of CS problems. To widen the set of possible bases 
that can be treated, the next constraint allows P to contain 
any sparse enough combination of the columns of a given 
dictionary. We show that the resulting CS problem can be 
viewed within the framework of standard CS, or as DL with 
a sparse dictionary l23| . We compare these two approaches 
for BCS with a sparse basis. For both classes of constrains we 
show that a Gaussian random measurement matrix satisfies the 
uniqueness conditions we develop with probability one. 

Our main contribution is inspired by multichannel systems, 
where the signals from each channel are sparse under separate 
bases. In our setting this translates to the requirement that 
P is block diagonal. For simplicity, and following several 
previous works 1271 - 12911 . we impose in addition that P is 
orthogonal. We then choose to measure the set of signals X by 
a measurement matrix A consisting of a union of orthogonal 
bases. This choice has been used in previous CS and DL works 
as well ETIl . l22ll . 1301 - 1321 . For technical reasons we also 
choose the number of blocks in P as an integer multiple of 
the number of bases in A. Using this structure we develop 
uniqueness results as well as a concrete recovery algorithm. 
The uniqueness condition follows from reformulating the BCS 
problem within the framework of DL and then relying on 
results obtained in that context. In particular, we require an 
ensemble of signals X, all sparse in the same basis. As we 
show, a suitable choice of random matrix A satisfies the 
uniqueness conditions with probability 1. 

Unfortunately, the reduction to an equivalent DL problem 
which is used for the uniqueness proof, does not lead to a 
practical recovery algorithm. This is due to the fact that it 
necessitates resolving the signed permutation ambiguity, which 
is inherent in DL. Instead, we propose a simple and direct 
algorithm for recovery, which we refer to as the orthogonal 
block diagonal BCS (OBD-BCS) algorithm. This method finds 
X = PS by computing a basis P and a sparse matrix S using 
two alternating steps. The first step is sparse coding, in which 
P is fixed and S is updated using a standard CS algorithm. 
In the second step S is fixed and P is updated using several 
singular value decompositions (SVD). 

The remainder of the paper is organized as follows. In 
Section HI] we review the fundamentals of CS and define 
the BCS problem. In Section [TIT] we prove that BCS is ill 
posed by showing that it can be interpreted as a certain ill- 
posed DL problem. In Sections IIV1 [Vl [VTl we consider the 
three constrained BCS problems respectively. A comparison 
between the different approaches is provided in Section IVIII 

II. BCS Problem Definition 

A. Compressed Sensing 

We start by shortly reviewing the main results in the field of 
CS needed for our derivations. The goal of CS is to reconstruct 



a vector x € R m from measurements b = Ax, where A e 
jgmxm an( j n m p ro blem is ill possed in general and 
therefore has infinitely many possible solutions. In CS we seek 
the sparsest solution: 



x = argmm \\x\\q 



s.t. 



b = Ax, 



(1) 



where || • ||o is the £0 semi-norm which counts the number of 
nonzero elements of the vector. This idea can be generalized 
to the case in which x is sparse under a given basis P, so that 
there is a sparse vector s such that x = Ps. Problem (Q~|) then 
becomes 



s = argmm ||s||o 



s.t. 



b = AP 



s, 



(2) 



and the reconstructed signal is x = Ps. When the maximal 
number of nonzero elements in s is known to equal k, we may 
consider the objective 



s = argmin \ \b — APs\ 



s.t. 



l s llo 



< k. 



(3) 



An important question is under what conditions fl])-© have 
a unique solution. In f9) the authors define the spark of a 
matrix, denoted by er(-), which is the smallest possible number 
of linearly dependent columns. They prove that if s is /c-sparse, 
and a(AP) > 2k, then the solution to (f2j), or equivalently (0), 
is unique. Unfortunately, calculating the spark of a matrix is 
a combinatorial problem. However, it is often bounded by the 
mutual coherence J9|, which can be calculated easily. Denoting 
the ith column of a matrix D by di, the mutual coherence of 
D is given by 



\d{di 



max 

i^j \\di 2 \\dj 



IMP) 



Therefore, a sufficient 



It is easy to see that a(D) > 1 

condition for the uniqueness of the solutions to (O or (O is 



k< 1 - 
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KAP), 

Although the uniqueness condition involves the product 
AP, some CS methods are universal. This means that by 
constructing a suitable measurement matrix A, uniqueness is 
guaranteed for any fixed orthogonal basis P. In such cases 
knowledge of P is not necessary for the sampling process. One 
way to achieve this universality property with probability 1 
relies on the next proposition. 

Proposition 1. If A is an i.i.d. Gaussian random matrix of size 
nx m, where n < m, then a(AP) = n + 1 with probability 1 
for any fixed orthogonal basis P. 

Proof: Due to the properties of Gaussian random variables 
and since P is orthogonal, the product AP is also an i.i.d. 
Gaussian random matrix. Since any n, or less, i.i.d. Gaussian 
vectors in W 1 are linearly independent with probability 1, 
a(AP) > n with probability 1. On the other hand, more 
then n vectors in W 1 are always linearly dependent, therefore 
er(AP) = n + 1. ■ 

According to Proposition Q] if A is an i.i.d Gaussian matrix 
and the number of nonzero elements in s is k < n/2, then 
the uniqueness of the solution to (f2]i or (01 is guaranteed with 
probability 1 for any fixed orthogonal basis P (see also 1331 ). 
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Problems (fZ|i and (0 are NP-hard in general. Many sub- 
optimal methods have been proposed to approximate their 
solutions, such as Hi-El, lfT3l - lfT31 . These algorithms can 
be divided into two main approaches: greedy algorithms and 
convex relaxation methods. Greedy algorithms approximate 
the solution by selecting the indices of the nonzero elements in 
s sequentially. One of the most common methods of this type 
is orthogonal matching pursuit (OMP) fT3l . Convex relaxation 
approaches change the objective in (O to a convex problem. 
The most common of these methods is basis pursuit (BP) 031 . 
which considers the problem: 

J = argmin ||s||i s.t. b = APs. (4) 

Under suitable conditions on the product AP and the sparsity 
level of the signals, both the greedy algorithms and the convex 
relaxation methods recover the true value of s. For instance, 
both OMP and BP recover the true value of s when the number 
of nonzero elements in s is no more than |(1 H — rxpr) fl~L 

B. BCS Problem Formulation 

Even when the universality property is achieved in CS, all 
existing algorithms require the knowledge of the sparsity basis 
P for the reconstruction process. The idea of BCS is to avoid 
entirely the need of this prior knowledge. That is, perform 
both the sampling and the reconstruction of the signals without 
knowing under which basis they are sparse. 

This problem seems impossible at first, since every signal is 
sparse under a basis that contains the signal itself. This would 
imply that BCS allows reconstruction of any signal from a 
small number of measurements without any prior knowledge, 
which is clearly impossible. Our approach then, is to sample 
an ensemble of signals that are all sparse under the same basis. 
Later on we revisit problems with only one signal, but with 
additional constraints. 

Let X G W nxN denote a matrix whose columns are 
the original signals, and let S G W nxN denote the matrix 
whose columns are the corresponding sparse vectors, such 
that X = PS for some basis P G W nxm . The signals 
are all sampled using a measurement matrix A G R™ xm , 
producing the matrix B = AX. For the measurements to 
be compressed the dimensions should satisfy n < in, where 
the compression ratio is L = m/n. Following ifPTl . 11241 we 
assume the maximal number of nonzero elements in each of 
the columns of S, is known to equal k. We refer to such 
a matrix S as a fc-sparse matrix. The BCS problem can be 
formulated as follows. 

Problem 2. Given the measurements B and the measurement 
matrix A find the signal matrix X such that B = AX where 
X = PS for some basis P and k-sparse matrix S. 

Note that our goal is not to find the basis P and the sparse 
matrix S. We are only interested in the product X = PS. 
In fact, for a given matrix X there is more than one pair of 
matrices P and S such that X = PS. Here we focus on the 
question of whether X can be recovered given the knowledge 
that such a pair exists for X. 



III. Uniqueness 

We now discuss BCS uniqueness, namely the uniqueness of 
the signal matrix X which solves Problem [2] Unfortunately, 
although Problem [2] seems quite natural, its solution is not 
unique for any choice of measurement matrix A, for any 
number of signals and any sparsity level. We prove this result 
by reducing the problem to an equivalent one, using the field 
of DL, and proving that the solution to the equivalent problem 
is not unique. 

In Section ITlI-AI we review results in the field of DL needed 
for our derivation. In Section IIII-BI we use these results to 
prove that the BCS problem does not have a unique solution. 
In Sections IIVI IVl [VTl we suggest several constraints on the 
basis P that ensure uniqueness. 

A. Dictionary Learning (DL) 

The field of DL |[T6l - ll20l focuses on finding a sparse matrix 
S G W raxN and a dictionary D G W nxm such that B = DS 
where only B G M. nxN is given. Usually in DL the dimensions 
satisfy n <C m. BCS can be viewed as a DL problem with D = 
AP where A is known and P is an unknown basis. Thus, one 
may view BCS as a DL problem with a constrained dictionary. 
However, there is an important difference in the output of DL 
and BCS. DL provides the dictionary D = AP and the sparse 
matrix S. On the other hand, in BCS we are interested in 
recovering the unknown signals X = PS. Therefore, after 
performing DL some postprocessing is needed to retrieve P 
from D. This is an important distinction which, as we show in 
Section [VI-BI makes it hard to directly apply DL algorithms. 

An important question is the uniqueness of the DL fac- 
torization. That is, given a matrix B G M™ xAr what are 
the conditions for the uniqueness of the pair of matrices 
D G R" Xm and S G « mxN such that B = DS where S 
is A;-sparse. Note that if some pair D, S satisfies B = DS, 
then scaling and signed permutation of the columns of D and 
rows of S respectively do not change the product B = DS. 
Therefore, there cannot be a unique pair D, S. In the context 
of DL the term uniqueness refers to uniqueness up to scaling 
and signed permutation. In fact in most cases without loss of 
generality we can assume the columns of the dictionary have 
unit norm, such that there is no ambiguity in the scaling, but 
only in the signed permutation. 

Conditions for DL uniqueness when the dictionary D is 
orthogonal or just square are provided in l28l and l29l . 
However, in BCS D = AP is in general rectangular. In lfT7l 
the authors prove sufficient conditions on D and S for the 
uniqueness of a general DL. We refer to the condition on D as 
the spark condition and to the conditions on S as the richness 
conditions. The main idea behind these conditions is that D 
should satisfy the condition for CS uniqueness, and that the 
columns of S should be diverse regarding both the locations 
and the values of the nonzero elements. More specifically, the 
conditions for DL uniqueness are: 

• The spark condition: a(D) > 2k. 

• The richness conditions: 

1) All the columns of S have exactly k nonzero 
elements. 
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2) For each possible fc-length support there are at least 
k + 1 columns in S. 

3) Any k + 1 columns in 5, which have the same 
support, span a /c-dimensional space. 

4) Any k + 1 columns in S, which have different 
supports, span a (k + l)-dimensional space. 

According to the second of the richness conditions the 
number of signals, that is the number of columns in S, must 
be at least + 1). Nevertheless, it was shown in ifTTl 

that in practice far fewer signals are needed. Heuristically, the 
number of signals should grow at least linearly with the length 
of the signals. It was also shown in IfTTl that DL algorithms 
perform well even when there are at most k nonzero elements 
in the columns of S instead of exactly k. 

B. BCS Uniqueness 

Under the conditions above the DL solution given the 
measurements B is unique. That is, up to scaling and signed 
permutations there is a unique pair D, S such that B = DS 
and S is fc-sparse. Since we are interested in the product PS 
and not in P or S themselves, without loss of generality 
we can always assume that the columns of P are scaled 
so that the columns of D = AP have unit norm. This 
way there is no ambiguity in the scaling of D and S, but 
only in their signed permutation. That is, applying DL on 
B provides D = APQ and S = Q T S for some unknown 
signed permutation matrix Q. A signed permutation matrix is 
a column (or row) permutation of the identity matrix, where 
the sign of each column (or row) can change separately. In 
other words, it has only one nonzero element, equal ±1, in 
each column and each row. Any signed permutation matrix is 
obviously orthogonal. 

If we can find the basis P = PQ out of D, then we can 
recover the correct signal matrix by: 

PS = PQQ T S = PS = X. 

Therefore, under the uniqueness conditions for DL on S and 
D = AP Problem [2] is equivalent to the following problem. 

Problem 3. Given D e R nxrn and A e R" xm , where n < m, 
find a basis P such that D = AP. 

We therefore focus on the uniqueness of Problem [3] Since 
n < m the matrix A has a null space. As we now show, even 
with the constraint that P is a basis there is still no unique 
solution. 

To see that assume Pi is a basis, i.e., has full rank, and 
satisfies D = AP\. Decompose P\ as P\ = P N ± + Pn where 
the columns of Pn are in N(A), the null space of A, and 
those of P N ± are in its orthogonal complement N(A) ± . Note 
that necessarily Pn ^ 0, otherwise the matrix P\ = P N ± is 
in N(A) 1 - and has full rank. However, since the dimension 
of N(A) 1 - is at most n < m, it contains at most n linearly 
independent vectors. Therefore, there is no m x m full rank 
matrix whose columns are all in N{A)- L . 

Next define the matrix P 2 = Pn± — Pn which is different 
from Pi, but it is easy to see that D = j4P 2 . Moreover, since 



the columns of Pn are perpendicular to the columns of P N ± , 

P?P l = P?P 2 = \\P N ±\\% + \\P N \\%. 

A square matrix P has full rank if and only if P T P has full 
rank. Therefore, since Pi has full rank and P 2 T P 2 = PfPi, 
P2 also has full rank. So that both Pi and P 2 are solutions 
to Problem [3] In fact there are many more solutions; some of 
them can be found by changing the signs of only part of the 
columns of Pn- 

We now return to the original BCS problem, as defined in 
Problem |2] We just proved that when the DL solution given B 
is unique, Problem |2] is equivalent to Problem [3] which has no 
unique solution. Obviously if the DL solution given B is not 
unique, then BCS will not be unique. Therefore, Problem [2] 
has no unique solution for any choice of parameters. 

In order to guarantee a unique solution we need an ad- 
ditional constraint. We next discuss constraints on P that 
can render the solution to Problem [3] unique, and therefore 
in addition to the richness conditions on S and the spark 
condition on AP they guarantee the uniqueness of the solution 
to Problem [2] Although there are many possible constraints, 
we focus below on the following. 

1) P is one of a finite and known set of bases. 

2) P is sparse under some known dictionary. 

3) P is orthogonal and has a block diagonal structure. 

The motivation for these constraints comes from the unique- 
ness of Problem [3] Nonetheless, we provide conditions under 
which the solution to Problem [2] with constraints 1 or 2 
is unique even without DL uniqueness. In fact, under these 
conditions the solution to Problem [2] is unique even when 
N = 1, so that there is only one signal. 

In the next sections we consider each one of the constraints, 
prove conditions for the uniqueness of the constrained BCS 
solution, and suggest a method to retrieve the solution. Table J] 
summarizes these three approaches. 

IV. Finite Set of Bases 

One way to guarantee a unique solution to Problem [3] is to 
limit the number of possible bases P to a finite set of bases, 
and require that these bases are different from one another 
under the measurement matrix A. Since P in Problem [3] is a 
column signed permutation of P in Problem [2] by limiting P 
to a finite set we also limit the possible P to a finite set. The 
new constrained BCS, instead of Problem |2] is then: 

Problem 4. Given the measurements B, the measurement 
matrix A and a finite set of bases ty, find the signal matrix X 
such that B = AX and X = PS for some basis $ and 
k-sparse matrix S. 

The motivation behind Problem |4] is that over the years a 
variety of bases were proven to lead to sparse representations 
of many natural signals, such as wavelet l25l and DCT l26l . 
These bases have fast implementations and are known to fit 
many types of signals. Therefore, when the basis is unknown 
it is natural to try one of these choices. 
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TABLE I 
SUMMARY OF CONSTRAINTS ON P 



The constraint 


Conditions for uniqueness 


Algorithm 


Finite Set - Section |IV] 
P is in a given finite set 
of possible bases ty. 


• cr(AP) < 2k for any Pet. 

• A is fc-rank preserving of \t (Definition |5J. 


• F-BCS - Solving (6) or (7) for each Pgt using a standard CS 
algorithm, and choosing the best solution. 


Sparse Basis - Section IVI 
P is kp -sparse under a 
given dictionary <E>. 


• <t(A$) > 2k P k. 


• Direct method - Solving (9) or (10) using a standard CS algorithm, 
where the recovery is X = <E>C. 

• Sparse K-SVD - Using sparse K-SVD algorithm (23) to retrieve S, Z, 
where the recovery is X = <&ZS. 


Structure - Section IVII 
P is orthogonal 2L-block 
diagonal. 


• The richness conditions on S. 

• A is a union of L orthogonal bases. 
. a(AP) =n+l. 

• A is not inter-block diagonal (Definition 1 10). 


• OBD-BCS - Updating S and P alternately according to the algorithm 
in Table ||y] where the recovery is X = PS. 



A. Uniqueness Conditions 

We now show that under proper conditions the solution 
to Problem |4] is unique even when there is only one signal, 
namely TV = 1. In this case instead of the matrices X, S, B 
we deal with the vectors x, s, b respectively. 

Assume x is a solution to Problem [4] That is, x is fc-sparse 
under Pe $ and satisfies b = Ax. Uniqueness is achieved 
if there is no x ^ x which is fc-sparse under a basis P e $ 
and also satisfies b = Ax. We first require that a(AP) > 2k; 
otherwise even if P = P there is no unique solution (9J. 
Since the real sparsity basis P is unknown we require that 
er(AP) > 2fc for any Pel. 

Next we write x = Ps = Ptst, where T is the index 
set of the nonzero elements in s with |T| < fc, st is the 
vector of nonzero elements in s, and Pt is the sub-matrix 
of P containing only the columns with indices in T. If x is 
also a solution to Problem |4] then x = Ps = PjSj, where J 
is the index set of the nonzero elements in s, and |J| < fc. 
Moreover, b = APjSj = APtst, which implies that the 
matrix A[Pt,Pj] has a null space. This null space contains 
the null space of [Pt,Pj]. By requiring 

rank(A[P T) Pj}) = rank[P T , Pj], (5) 

we guarantee that the null space of A[Pr, Pj] equals the null 
space of [Pt,Pj]. Therefore, under (|5), APjsj = APtst if 
and only if Pjsj = Ptst, which implies x = x. 

Therefore, in order do guarantee the uniqueness of the 
solution to Problem |4] in addition to the requirement that 
a(AP) > 2k for any P £ >t, we require that any two index 
sets T, J of size fc and any two bases P, P £ \f satisfy (0. 

Definition 5. A measurement matrix A is fc-rank preserving 
of the bases set if any two index sets T, J of size fc and any 
two bases P, P £ VP satisfy (|5). 

The conditions for the uniqueness of the solution to Prob- 
lem g] are therefore: cr(AP) > 2k for any P e and A 
is fc-rank preserving of the set "P. In order to satisfy the 
first condition with probability 1, according to Section Hl-AI 
we can require all P E ^ to be orthogonal and generate A 
from an i.i.d. Gaussian distribution. However, since the number 
of bases is finite, we can instead verify the first condition 
is satisfied by checking the spark of all the products AP. 



Alternatively, one can bound the spark of these matrices using 
their mutual coherence. 

It is easy to see that any full column rank matrix A is fc- 
rank preserving for any fc and any set ^ . However, in our 
case A is rectangular and therefore does not have full column 
rank. In order to guarantee that A is fc-rank preserving with 
probability 1 we rely on the following proposition: 

Proposition 6. An i.i.d Gaussian matrix A of size n x m is 
with probability 1 k-rank preserving of any fixed finite set of 
bases and any k < n/2. 

Proof: If n > m then A has full column rank with probabil- 
ity 1, and is therefore fc-rank preserving with probability 1. We 
therefore focus on the case where n < m. Assume T, J are 
index sets of size fc, and P,PE "P. Denote r = rankfPy, Pj}- 
We then need to prove that rank(A[Pr, Pj]) = r. 

Perform a Gram Schmidt process on the columns of 
[Pt,Pj] and denote the resulting matrix by G. G is then an 
m x r matrix with orthonormal columns, with rank(G) = r 
and rank(AG) = rank(A[P T , Pj]). Next we complete G to 
an orthogonal matrix G u by adding columns. According to 
Proposition[T]since A is an i.i.d Gaussian matrix and G u is or- 
thogonal a(AG u ) = ri+1 with probability 1. Therefore, with 
probability 1 any t columns of AG U are linearly independent, 
with t < n. In particular, with probability 1 the columns of AG 
are linearly independent, so that rank(/lG) = r, completing 
the proof. ■ 

Until now we proved conditions for the uniqueness of 
Problem 2] when there is only one signal N = 1. The same 
conditions are true for TV > 1 since we can look at every signal 
separately. However, since all the signals are sparse under the 
same basis, if TV > 1 then the condition that A must be fc-rank 
preserving can be relaxed. 

For instance, consider the case where there are only two 
index sets T, J and two bases P,P G \& (P is the real 
sparsity basis) that do not satisfy (O. In this case if we have 
many signals with different sparsity patterns, then only a small 
portion of them fall in the problematic index set, and therefore 
might falsely indicate that P is the sparsity basis. However, 
most of the signals correspond to index sets that satisfy ©, 
and therefore these signals indicate the correct basis. The 
selection of the sparsity bases is done according to the majority 
of signals and therefore the correct basis is selected. 
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Another example is the case where there are enough diverse 
signals such that the richness conditions on S are satisfied. 
In this case it is enough to require that for any two bases 
P,Pe^> the matrices AP and AP are different from one 
another even under scaling and signed permutation of the 
columns. This way we guarantee that the problem equivalent 
to Problem [4] under the richness and spark conditions has a 
unique solution, and therefore Problem [4] also has a unique 
solution. 

Problem|4]can also be viewed as a CS problem with a block 
sparsity constraint J34|, O. That is, if * = {Pi, P 2 , ...} then 
the desired signal matrix can be written as 



TABLE II 
F-BCS SIMULATION RESULTS 



X=[P U P 2 ,...} 



Si 

s 2 



where only one of the submatrices Si is not all zeros. In con- 
trast to the usual block sparsity constraint here the sub-matrix 
Si which is not zero is itself sparse. However, the uniqueness 
conditions which are implied from this block sparsity CS 
approach are too strong comparing to our BCS approach. For 
instance, they require all Pj £ ty, to be incoherent, whereas 
the BCS uniqueness is not disturbed by coherent bases. In fact 
the solution is unique even if the bases in "J 7 equal one another. 
This is because here we are not interested in recovering Si but 
rather Pi Si. 

B. The F-BCS Method 

The uniqueness conditions we discussed lead to a straight- 
forward method for solving Problem|4] We refer to this method 
as F-BCS which stands for finite BCS. When N = 1, F-BCS 
solves a CS problem for each P £ 4* 



s = argmm ||s||o 



s.t. b = AP 



s, 



(6) 



and chooses the sparsest s. Under the uniqueness conditions 
it is the only one with no more than k nonzero elements. 
Therefore if we know the sparsity level k we can stop the 
search when we found a sparse enough s. The recovered signal 
is x = Ps where P is the basis corresponding to the s we 
chose. When k is known an alternative method is to solve for 
each P £ * 



s = arg min \ \b — APt 



|o S.t. 



so 



< k, 



(7) 



and choose s that minimizes ||6 — APs|||. In the noiseless 
case this minimum is zero for the correct basis P. 

When N > 1 we can solve either (O or (0 for each of the 
signals and select the sparsity basis according to the majority. 

The solution to problems (|6]l and (0 can be approximated 
using one of the standard CS algorithms. Since these algo- 
rithms are suboptimal, there is no guarantee that they provide 
the correct solution x, even for the correct basis P. In general, 
when k is small enough relative to n these algorithms are 
known to perform very well. Moreover, when N > 1, P is 
selected according to the majority of signals, and therefore if 
the CS algorithm did not work well on a few of the signals it 
will not effect the recovery of the rest of the signals. 
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C. F-BCS Simulation Results 

We now demonstrate the F-BCS method in simulation. We 
chose the set of bases 4^ to contain 5 bases of size 64 x 64: 
the identity, DCT l26l . Haar wavelet, Symlet wavelet and 
Biorthogonal wavelet l25ll . 100 signals of length 64 were 
created randomly by generating random sparse vectors and 
multiplying them by the Biorthogonal wavelet basis in \I r . Each 
sparse vector contained up to 6 nonzero elements in uniformly 
random locations, and values from a normal distribution. 

The measurement matrix A was an i.i.d Gaussian matrix of 
size 32 x 64. The measurements were calculated first without 
noise, that is B = AX, and then with additive Gaussian noise 
with varying SNR from 30dB to 5dB. For each noise level the 
F-BCS method was performed, where the CS algorithm we 
used was OMP |[T3l 

Table [II] summarizes the results. For all noise levels the 
basis selection according to the majority was correct. The miss 
detected column in the table contains the percentage of signals 
that indicated a false basis. The average error column contains 
the average reconstruction error, calculated as the average of 



(8) 



where the columns of the real signal matrix X and 

the reconstructed signal matrix X respectively. The average 
is performed only on the signals that indicated the correct 
basis. The reconstruction of the rest of the signals obviously 
failed. As can be seen from Table HI1 in the noiseless case the 
recovery is perfect and the error grows with the noise level. For 
high SNR there are no false reconstructions, but as the SNR 
decreases beyond 15dB the percentage of false reconstructions 
increases. In these cases, one should use more then one signal, 
such that if one of the signals failed there will be an indication 
for this through the rest of the signals. 

Another simulation we performed investigated the influence 
of the sparsity level k, which is the number of nonzero 
elements in S. The settings of this simulation were the same 
as those of the first simulation, only this time there was 
no noise added to the measurements, and k was gradually 
increased from 1 to 32. For each sparsity level new signals 
were generated with the same sparsity basis and measured by 
the same measurement matrix. For k < 8 the recovery of the 
signal was perfect, but as expected, for higher values of k the 
number of false reconstructed signals and the average error 
grew. The reason for this is that the OMP algorithm works 
well with small values of k, for higher values of k, even if the 
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uniqueness conditions are still satisfied, the OMP algorithm 
may not find the correct solution. 

V. Sparse Basis 

A different constraint that can be added to Problem [2] in 
order to reduce the number of solutions is the sparsity of the 
basis P. That is, we assume that the columns of the basis 
P are sparse under some known dictionary $, so that there 
exists some unknown sparse matrix Z such that P = $Z. We 
assume the number of nonzero elements in each column of Z 
is known to equal k p . We refer to <E> as a dictionary since it 
does not have to be square. Note that in order for P to be a 
basis $ must have full row rank, and Z must have full column 
rank. 

The constrained BCS in this case is then: 

Problem 7. Given the measurements B, the measurement 
matrix A and the dictionary <£>, which has full row rank, find 
the signal matrix X such that B = AX where X = &ZS for 
some k-sparse matrix S and k p -sparse and full column rank 
matrix Z. 

This problem is similar to that studied in ||231 in the context 
of sparse DL. The difference is that ||23l finds the matrices 
Z, S, while we are only interested in their product. The 
motivation behind Problem UJ is to overcome the disadvantage 
of the previously discussed Problem |4] in which the bases are 
fixed. When using a sparse basis we can choose a dictionary 
$ with fast implementation, but enhance its adaptability to 
different signals by allowing any sparse enough combination 
of the columns of $. Note that we can solve the problem 
separately for several different dictionaries $, and choose the 
best solution. This way we can combine the sparse basis 
constraint and the constraint of a finite set of bases. Another 
possible combination between these two approaches is to 
define the basic dictionary as $ = [P±, P2, ...], where the 
finite set of bases is ^ = {P%, P2, ...}, This way we allow 
any sparse enough combination of columns from all the bases 
in vp. 

A. Uniqueness Conditions 

As we now show, here too under appropriate conditions the 
constrained problem has a unique solution even when there is 
only one signal N = 1. Therefore, instead of matrices X, S, B 
we deal with vectors x,s,b respectively. Since ||s||o < k and 
Z is fcp-sparse, the vector c = Zs necessarily satisfies ||c||o < 
k p k. Therefore, Problem UJ as 

c = argmin ||c|| s.t. b = A<frc, (9) 

c 

or equivalently: 

c = argmin | \b - A$c\\l s.t. ||c||o < k p k, (10) 

c 

where the recovery is x = $c. The solutions to (O and (ITOb 
are unique if a(A&) > 2k p k. If there is more then one signal, 
N > 1, then one can solve (0 and (TToT > for each signal 
separately. 

Note that in Problem UJ the matrix Z necessarily has full 
column rank, while this constraint is dropped in (O and ( fTOt . 



However, if the solution without this constraint is unique then 
obviously the solution with this constraint is also unique. 
Therefore, a sufficient condition for the uniqueness of Prob- 
lem UJ is er(A$) > 2k p k. 



B. Algorithms For Sparse BCS 

1 ) Direct Method: When there is only one signal, according 
to the uniqueness discussion, the solution to Problem UJ can 
be found by solving either (0 or ( TTOb using a standard CS 
algorithm. When there are more signals the same process 
can be performed for each signal separately. Since we use a 
standard CS algorithm, for this method to succeed we require 
the product k p k to be small relative to n. 

2) Sparse K-SVD: The sparse K-SVD algorithm ||23l is a 
DL algorithm that seeks a sparse dictionary. That is, given the 
measurements B and a base dictionary D it finds fc p -sparse 
Z and fc-sparse S, such that B = DZS. In our case we can 
run sparse K-SVD on B with D = A$ in order to find Z 
and S, and then recover the signals by X = &ZS. The sparse 
K-SVD algorithm is a variation of the K-SVD algorithm ll24l . 
which is a popular DL algorithm. Sparse K-SVD consists of 
two alternating steps. The first is sparse coding, in which Z 
is fixed and S is updated using a standard CS algorithm. The 
second step is dictionary update, in which the support of S is 
fixed and Z is updated together with the value of the nonzero 
elements in S. The difference between sparse K-SVD and K- 
SVD is only in the dictionary update step. Since the sparse K- 
SVD is a DL algorithm, it requires a large number of diverse 
signals. Moreover, the required diversity of the signals can 
prevent the algorithm from working, for instance in cases of 
block sparsity. 

In general, BCS cannot be solved using DL methods. 
However, under the sparse basis constraint BCS is reduced to a 
problem that can be viewed as constrained DL, and therefore 
solved using sparse K-SVD. Nevertheless, Problem UJ is not 
exactly constrained DL, since in DL we seek the matrices 
S and Z themselves, whereas here we are interested only in 
their product X = &ZS. Moreover, as in any DL algorithm, 
for sparse K-SVD to perform well it requires many diverse 
signals. However, for the uniqueness of Problem UJ or for 
the direct method of solution, there is no need for such a 
requirement. The sparse K-SVD algorithm is also much more 
complicated than the direct method. 

Nonetheless, sparse K-SVD has one advantage over the 
direct method in solving Problem UJ The direct method uses a 
standard CS algorithm in order to find C = ZS which is k p k- 
sparse. This algorithm provides the correct result only if the 
product k p k is small enough relative to n. On the other hand, 
the standard CS algorithms used in sparse K-SVD attempt to 
find separately S which is fc-sparse and Z which is fc p -sparse, 
and therefore require fc and k p themselves to be small instead 
of the product k p k. Thus, when there are few signals, or even 
just one, and when fc p fc is small relative to n, then Problem UJ 
should be solved using the direct method. If k p k is large but 
still satisfies a{A$) > 2k p k, and if there are enough diverse 
signals, then sparse K-SVD should be used. 
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Error Vs. Sparsity Level 




TABLE III 

RECONSTRUCTION ERROR FOR DIFFERENT NOISE LEVELS 



Fig. 1 . Reconstruction error as a function of the sparsity level 



C. Simulation Results 

Simulation results for sparse K-SVD can be found in 
Here we present simulation results for the direct method. First 
of all we tested the influence of the sparsity level of the basis. 
We generated a random sparse matrix - Z, of size 256 x 256 
with up to k p = 6 nonzero elements in each column. The value 
of k - the number of nonzero elements in 3, was gradually 
increased from 1 to 20. For each k we generated S as a random 
fc-sparse matrix of size 256 x 100, and created the signal matrix 
X = &ZS, where $ was the DCT basis. X was measured 
using a random Gaussian matrix A of size 128 x 256, resulting 
in B = AX. 

We solved Problem [7] given A and B using the direct 
method, where again the CS algorithm we used was OMR 
For comparison we also performed OMP with the real basis P, 
which is unknown in practice. Fig Q] summaries the results. For 
every value of k the error of each of the graphs is an average 
over the reconstruction errors of all the signals, calculated as 
in ([8]). Both the errors are similar for k < 8, but for larger fc's 
the error of the blind method is much higher. 

Since A is an i.i.d Gaussian matrix and the DCT matrix 
is orthogonal with probability 1, a(AQ) = 129. Therefore 
with probability 1 the uniqueness of the sparse BCS method 
is achieved as long as k p k < 64, or k < 10. The error began to 
grow before this sparsity level because OMP is a suboptimal 
algorithm that is not guaranteed to find the solution even when 
it is unique, but works well on sparse enough signals. The 
reconstruction error of the OMP which used the real P grows 
much less for the same values of k. That is since in this case 
k itself, instead of k p k, should be small relative to n. 

Sparse K-SVD can improve the results for high value of k, 
assuming of course it is small enough for the solution to be 
unique. However, in this simulation the number of signals is 
even less then the length of the vectors, and sparse K-SVD 
does not work well with such a small number of signals. In 
the sparse K-SVD simulations which are presented in l23l 
the number of signals is at least 100 times the length of the 
signals. 

We also investigated the influence of noise on the algorithm. 
The setting of this simulations were the same as in the previous 



SNR 


CS 


sparse BCS 


oo 


10" 14 % 


10~ 14 % 


30dB 


1.2% 


2.8% 


25dB 


1.5% 


5.8% 


20dB 


3.3% 


11.9% 


15dB 


7.1% 


23.5% 



simulation only this time we fixed k = 3 and added Gaussian 
noise to the measurements B. We looked at different noise 
levels, and for each level we ran the direct method for sparse 
BCS, and also for comparison we ran an OMP algorithm 
which used the real basis P. Table Hill summarizes the average 
errors of each of the methods. In the noiseless case there is 
a perfect recovery in both cases. As the SNR decreases both 
errors increases, but as can be expected, the one of the BCS 
grows faster. The reason for the big difference in the low SNR 
cases is again the fact that in the CS case the OMP algorithm 
is performed on sparser signals, relative to the sparse BCS 
case. 

VI. Structural Constraint 

The last constraint we discuss is a structural constraint on 
the basis P. We require P to be block diagonal and orthogonal. 
The motivation for the block diagonal constraint comes form 
Problem [3] which looks for P such that D = AP. Assume 
for the moment that P is block diagonal, such that: 



P 



Pi 



Pi 



and A is chosen to be a union of orthonormal bases, as in ETTl . 
l2l . |3Q|-|32|. That is, A = [A U ...A L ] where Ai,...,A L are 
all orthonormal matrices. In this case 



D=[D U ...,D L ] = [AxP u ...,A L P L ], 
and we can simply recover P by: 
| A\D X 



AID L 



(11) 



Therefore, the solution to Problem [3] under the constraint that 
P is block diagonal is very simple. 

Under the richness and spark conditions the BCS problem, 
as defined in Problem [21 is equivalent to Problem [3] where the 
basis P in Problem [3] is a column signed permutation of the 
basis P in Problem|2] Since we are interested in the solution to 
Problem |2j the constraint should be on the basis P instead of 
P. However, if we constrain P to be block diagonal, then the 
solution to the equivalent Problem[3]is not as simple as in ( fTTT ). 
In Problem [3] we look for P = PQ, for some unknown signed 
permutation matrix Q. Under the block diagonal constraint on 
P the matrix P = PQ is not necessarily block diagonal, and 
therefore we cannot use ( fTTT i to recover it. 



9 



We can guarantee that P is block diagonal only if we can 
guarantee that Q is block diagonal. That is, Q permutes only 
the columns inside each block of P, and does not mix the 
blocks or change the outer order of them. As we prove below 
in the uniqueness discussion, this can be guaranteed if we 
require P to have more blocks than A. Specifically, we require 
P to have 2L blocks, which is twice the number of blocks in 

A. Such a basis P is called 2L-block diagonal. In fact, the 
number of blocks in P can be ML for any integer M > 2. 
We use M = 2 for simplicity; the expansion to M > 2 is 
trivial. 

We also constraint P to be orthogonal. The motivation for 
this is the spark condition. In order be able to solve ProblemfJ] 
instead of Problem [2] we need to satisfy a{AP) > 2k. By 
constraining P to be orthogonal we can use results similar 
to Proposition Q] in order to achieve this requirement with 
probability 1. 

The constrained BCS problem is then: 

Problem 8. Given the measurements B and the measurement 
matrix A £ U™ X ™ L f{ n d me signal matrix X such that B = 
AX where X = PS for some orthogonal 2L-block diagonal 
matrix P and k-sparse matrix S. 

In this new settings the size of the measurement matrix A is 
n x nL, where n is the number of measurements and L is the 
number of n x n blocks in A, which equals the compression 
ratio. Moreover, The length of the signals is m = nL, and the 
size of the basis P is nL x nL. Since P is 2L-block diagonal, 
the size of its blocks is | x |. Therefore, n must be even. 

This constrained problem can be useful for instance in 
multichannel systems, where the signals from each channel 
are sparse under separate bases. In such systems we can 
construct X by concatenating signals from several different 
channels, and compressively sampling them. For example, 
in microphone arrays ll36l or antenna arrays (37), we can 
divide the samples from each microphone / antenna into time 
intervals in order to obtain the ensemble of sampled signals 

B. Each column of B is a concatenation of the signals from 
all the microphones / antennas over the same time interval. 



A. Uniqueness Conditions 

To ensure a unique solution to Problem [8] we need the 
DL solution given B to be unique. Therefore, we assume that 
the richness conditions on S and the spark condition on AP 
are satisfied. Then, Problem [8] is equivalent to the following 
problem: 

Problem 9. Given the matrices D and A, which have more 
columns then rows, find an orthogonal P such that D = AP, 
and P = PQ for some signed permutation matrix Q and 
orthogonal 2L-block diagonal matrix P. 

In order to discuss conditions for uniqueness of the solution 
to Problem [9] we introduce the following definition. 

Definition 10. Denote A = [A 1 ,..., A L ], such that A, G R" x " 
for any 1 < i < L. A is called inter-block diagonal ;/ there 



are two indices i ^ j for which the product: 



satisfies: 

rank(Ri) = rank(Ri) 

n 

rank[R-i) = rank(R%) = — — rank(Ri). 

In particular if the product Af Aj is 2-block diagonal then A 
is inter-block diagonal. 

With this definition in hand we can now define the condi- 
tions for the uniqueness of Problem [9] 

Theorem 11. If A <E R nxrlL is a union of L orthogonal bases, 
which is not inter-block diagonal, and a(AP) = n + 1, then 
the solution to Problem [9] is unique. 

The proof of this theorem uses the next lemma. 

Lemma 12. Assume P and P are both orthogonal 2L- 
block diagonal matrices, and A satisfies the conditions of 
Theorem [77] If AP = APQ for some signed permutation 
matrix Q, then P = PQ. 

In general since A has a null space, if the matrices A, P, P 
did not have their special structures, then the equality AP = 
APQ would not imply P = PQ. However, according to 
Lemma [12] under the constraints on A, P, P this is guaranteed. 
The full proof of Lemma [T2l appears in Appendix A. Here we 
present only the proof sketch. 

Proof sketch: It is easy to see that due to the orthogonality 
of the blocks of A, if Q is block diagonal then AP = APQ 
implies P = PQ. Therefore, we need to prove that Q is 
necessarily block diagonal. Denote D = AP. In general the 
multiplication DQ can yield three types of changes in D. It 
can mix the blocks of D, permute the order of the blocks of 
D, and permute the columns inside each block. Q is block 
diagonal if and only if it permutes only the columns inside 
each block, but does not mix the blocks or change their outer 
order. First we prove that Q cannot mix the blocks of D. 
For this we use the condition on the spark of D, and the 
orthogonality of the blocks. Next we prove that Q cannot 
change the outer order of the blocks. This time we use the 
fact that both P and P have 2L blocks and that A is not inter- 
block diagonal. Therefore, Q can only permute the columns 
inside each block, which implies it is block diagonal ■ 

If P and P have only L blocks instead of 2L, then Q can 
change the outer order of the blocks of D, such that it does 
not have to be block diagonal. Therefore, if the constraint on 
P was that it has L blocks instead of 2L, then Lemma [12] 
would be incorrect, such that the solution to the Problem [9] 
and therefore to Problem [8] would not be unique. On the other 
hand the extension of the proof of Lemma Q~2] to ML blocks 
where M > 2 is trivial. 

Proo f o f Theorem 1771 The proof we provide for Theorem fTTI 
is constructive, although far from being a practical method to 
deploy in practice. Denote the desired solution of Problem [9] 
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by P = PQ, and denote: 

A = [A 1 ,...,A L ] , P = 



P 1 



P 



2L 



where A for i = 1,..,L and P 3 for j = 1,...,2L are all 
orthogonal matrices. 

We first find a permutation matrix Qd such that D = 
DQd = AP, where P is an orthogonal 2L-block diagonal 
matrix. There is always at least one such permutation. For 
instance, we can choose Qd to equal the absolute value of 
Q T . In this case P equals P up to the signs, and therefore it 
is necessarily orthogonal 2L-block diagonal. 

Denote the blocks of P by P J for j = 1, 2L, and note 
that 



D=[D 1 ,...,D L } -- 



p 



2L-1 



P 



2L 



Since A; are orthogonal for all i = 1,...,L, we can recover 
the blocks of P by 



P 



2i-l 



P 1 



AfDi, 



such that 



P 



A T L D L 



Since both P and P are orthogonal 2L-block diagonal, ac- 
cording to Lemma [T2l the equality D = AP = APQQo 
implies P = PQQd- Therefore, we can recover P by 
P = PQ = PQ T D . ■ 

The conclusion from Theorem QT| is that if the richness 
conditions on S are satisfied and A satisfies the conditions of 
Theorem [TT] then the solution to Problem [8] is unique. 

As proven in Appendix B one way to guarantee that A 
satisfies the conditions of Theorem [TT] with probability 1 is 
to generate it randomly from an i.i.d Gaussian distribution 
and perform a Gram Schmidt process on each block in order 
to make it orthogonal. This claim is similar to Proposition [TJ 
except that the statistics of A is a bit different due to the Gram 
Schmidt process. 

B. The OBD-BCS Algorithm 

Although the uniqueness proof is constructive it is far from 
being practical. In order to solve Problem [8] by following the 
uniqueness proof one needs to perform a DL algorithm on 
B, resulting in D, S. Then go over all the permutations D = 
DQd, and look for Qd such that the matrices AfDi, for 
all i = 1,...,L, are 2-block diagonal. After finding such a 
permutation the recovery of X is 



X 



A\D X 



AJD L 



QdS- 



The problem with this method is the search for the permuta- 
tion Qd- There are ml different permutations of the columns 
of D, where m = nL is the length of the signals, while 
only [(j|;)!] 2L of them satisfy the requirement (see Appendix 
C). As m and L grow the relative fraction of the desirable 
permutations decreases. For instance, for signals of length 
m = 16 and a compression ratio of L = 2 only 1.58-10 _6 % of 
the permutations satisfy the requirement. For the same signals 
but a higher compression ratio of L = 4 only 1.22 • 10 _9 % 
satisfy the condition, and for longer signals of length m = 64 
and L = 2 only 1.51 • 10~ 34 % satisfy the requirement. 

Therefore, a systematic search is not practical, even for short 
signals. Moreover, in practice the output of the DL algorithm 
contains some error, so that even for the correct permutation 
the matrices A~ 1 L>i are not exactly 2-block diagonal, which 
renders the search even more complicated. Although there 
exist suboptimal methods for permutation problems such as 
||38l , these techniques are still computationally extensive and 
are sensitive to noise. 

Instead we present the orthogonal block diagonal BCS 
(OBD-BCS) algorithm for the solution of Problem [8] which 
is, in theory, equivalent to DL followed by the above post- 
processing. However, it is much more practical and simple. 
This algorithm is a variation of the DL algorithm in |F2~T1 . 
Il22l . which learns a dictionary under the constraint that 
the dictionary is a union of orthogonal bases. Given B the 
algorithm in lETl . Il22l aims to solve 



min IIS - -DjSIII. 

D.S 



(12) 



s.t. S is fc-sparse and D is a union of orthogonal bases. 

In the BCS case P is orthogonal 2L-block diagonal and A 
is a union of L orthogonal bases. Therefore, the equivalent 
dictionary is: 



D = AP = 
P 1 

Ax 



P 2 



,A, 



P 



2L-1 



P 



2L 



Since all Aj and P l are orthogonal, here too D is a union of 
orthogonal bases. The measurement matrix A is known and 
we are looking for an orthogonal 2L-block diagonal matrix P 
and a sparse matrix S such that B = APS. This leads to the 
following variant of ( TTZb : 



mm\\B - APS\\l 
p,s 



(13) 



s.t. S is fc-sparse and P is orthogonal 2i-block diagonal. 

The algorithm in l2Tl . El consists of two alternating steps. 
The first step is sparse coding, in which the dictionary D is 
fixed and the sparse matrix S is updated. The second step is 
dictionary update, in which S is fixed and D is updated. This 
algorithm finds the dictionary D = AP and the sparse matrix 
S but not the basis P, and consequently, not the signal matrix 
X = PS. 

In OBD-BCS we follow similar steps. The first step is again 
sparse coding, in which P is fixed and S is updated. The 
second step is basis update, in which S is fixed and P is 
updated. The difference between OBD-BCS and the algorithm 
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in EH . ||22|| is mainly in the second step, where we add the 
prior knowledge of the measurement matrix A and the block 
diagonal structure of P. In addition, we use a different CS 
algorithm in the sparse coding step. 

We now discuss in detail the two steps of OBD-BCS. 

1) Sparse Coding: In this step P is fixed so that the 
optimization in (l3[ becomes: 

mm I \B - APS\\% s.t. S is fc-sparse. (14) 

It is easy to see that ( [Pil l is separable in the columns of S. 
Therefore, for each column of B and S we need to solve 

min||6- APs\\\ s.t. ||s|| < k, (15) 

s 

where s, b are the appropriate columns of S, B respectively. 
This is a standard CS problem, as in (O, with the additional 
property that the combined measurement matrix D = AP is a 
union of orthogonal bases. This property is used by the block 
coordinate relaxation (BCR) algorithm ETJ, E2, E2). The 
idea behind this algorithm is to divide the elements of s into 
blocks corresponding to the orthogonal blocks of D. In each 
iteration all the blocks of s are fixed except one, which is 
updated using soft thresholding. The DL algorithm proposed 
by ll2D . El is a variation of the BCR algorithm, which aims 
to improve its convergence rate. In OBD-BCS we can also use 
this variation. However, experiments showed that the results 
are about the same as the results with OMR Therefore, we use 
OMP in order to update the sparse matrix S, when the basis 
P is fixed. 

2) Basis Update: In this step the sparse matrix S is fixed 
and P is updated. Divide each of the nL x N matrices S and 
X into 2L submatrices of size | x JV such that: 





' s 1 ' 




' X 1 ' 


s = 




, x = 






_ S 2L _ 




X 2L 



Divide each orthogonal block of A into two blocks: A4 = 
[A 21 - 1 , A 21 ] for i = 1, ...,L, such that: 

A = [A 1 ,...,A L ] = [A 1 , A 2 , ...,A 2L ~ 1 ,A 2L ]. 

With this notation X 1 = P l S\ and B = A^S*. 
Therefore, ( TT~3T > becomes: 

2L 

min \\B- ^ A j P j S j \\% (16) 

pl p2L " t-^ " P 

3=1 

s.t. P 1 ,...,P 2L are orthogonal. 

To minimize ([ToT l. we iteratively fix all the blocks P-? for j = 
1, 2L except one, denoted by P z , and solve 

min I |B' - A i P i S i \\p s.t. P i is orthogonal (17) 

where B i = B-Y,^ A> P? S J . With slight abuse of notation, 
from now on we abandon the index i. 

Since P is orthogonal and A is constructed of columns from 
an orthogonal matrix, P T A T AP = I, and PP5|||, = ||S||f,. 
Thus, (fTTI i reduces to 

max{Tr [B T APS]} s.t. P is orthogonal. (18) 



TABLE IV 
THE OBD-BCS ALGORITHM 



Inputs: 

• B G R" xJV - measurements 

• A G R«xnL _ measur ement matrix (union of L orthogonal bases) 
Outputs: 

• X G M. nL x N - reconstructed signal matrix 
Algorithm: 

• Initiate P = I (the identity). 

• Repeat until a stoping criteria is reached: 

o Sparse coding: find the sparsest 5 such that B = APS, 

for instance using OMP. 
o Basis update: for all i = 1, ...,2L: 

Calculate B i = B — £ A-? pi S j . 

Use SVD: S l {B i ) T A % = UT,V T . 

Update: P i = VU T . 

• Calculate: X = PS. 



Let the singular value decomposition (SVD) of the matrix 
R = SB T A be R = U\ZV T , where U, V are orthogonal 
matrices and S is a diagonal matrix. Using this notation we 
can manipulate the trace in ( fT8l as follows: 

Tt[B t APS] = Tx[SB T AP] = Tt[EV t PU}. 

The matrix Z = V T PU is orthogonal if and only if P is 
orthogonal. Therefore, (fT8l is equivalent to 

max{Tr [EZ]} s.t. Z is orthogonal. 

If the matrix R = SB T A has full rank then E is invertible. 
In this case the maximization is achieved only for Z = I, and 
therefore P l = VU T is the unique minimum of ( fTTI i. Even if 
R does not have full rank P l = VU T achieves a minimum of 

<H3. 

Table |IV] summarize the OBD-BCS algorithm. Note that the 
initiation can be any 2i-block diagonal matrix, not necessarily 
the identity matrix as written in the table; however, the 
identity matrix is simple to implement. This algorithm is much 
simpler then following the uniqueness proof, which requires a 
combinatorial permutation search. Each iteration of the OBD- 
BCS algorithm uses a standard CS algorithm and 2L SVDs. 

An important question that arises is whether the OBD-BCS 
algorithm converges. To answer this question we look at each 
step separately. If the sparse coding step is performed perfectly 
it solves ([Pit for the current P. That is, the objective of (IT3b 
is reduced or at least stays the same. In practice, for small 
enough k the CS algorithm converges to the solution of ( fT4l . 
However, in order to guarantee the objective of (TT~3T > is reduced 
or at least not increased in this step, we can always compare 
the new solution after this step with the one from the previous 
iteration and chose the best of them. 

Note that this step is performed separately on each column 
of S. That is, we can choose to keep only some of the columns 
from the previous iteration, while the rest are updated. If at 
least part of the columns are updated then the next basis update 
step changes the basis P, so that in the following sparse 
coding step we can get a whole new matrix S. Therefore, 
the decision to keep the results from the previous iteration 
does not imply we keep getting the same results in all the 
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next iterations. Another possibility is to keep only the support 
of the previous solution and update the values of the nonzero 
elements using least-squares. In practice, in our simulations 
the algorithm converges even without any comparison to the 
previous iteration. 

The basis update step is divided into 2L steps. In each, all 
the blocks of P are fixed except one, which is updated to 
minimize (jTTJ. Therefore, the objective of dTTb is reduced or 
at least stays the same in each of the 2L steps constructing 
the basis update step. Therefore, the objective of ( fToT l, which 
is equivalent to (13[ with fixed S, is reduced or not increased 
in the basis update step. 

Thus, as in 11211 . l22l . the algorithm we are based on, and 
as in other DL algorithms such as l20l . l24l . we cannot prove 
the OBD-BCS algorithm converges to the unique minimum of 
( Tol l. However, we can guarantee that under specific conditions 
there is a unique minimum and that the objective function is 
reduced or at least stays the same in each step of the algorithm. 
Furthermore, as can be seen in the next section the OBD-BCS 
algorithm performs very well in simulations on synthetic data. 

C. OBD-BCS Simulations 

As in the first two constraints we evaluated the algorithm 
performance on synthetic data. The signal matrix X had 64 
rows and was generated as a product of a random sparse matrix 
- S and a random orthogonal 4-block diagonal matrix - P. The 
value of the nonzero elements in S was generated randomly 
from a normal distribution, and the four orthogonal blocks 
of P were generated from a normal distribution followed by 
a Gram Schmidt process. The measurement matrix A was 
constructed of two random 32 x 32 orthogonal matrices, that 
were generated from a normal distribution followed by a Gram 
Schmidt process. The number of signals and the sparsity level 
were gradually changed in order to investigate their influence. 

The stopping rule of the algorithm was based on a maximal 
number of iterations and the amount of change in the matrices 
S and P. If the change from the last iteration was too small, 
or if the maximal number of iterations was reached, then the 
algorithm stopped. In most cases the algorithm stopped due to 
small change between iterations after about 30 iterations. 

First we examined the influence of two parameters, N - the 
number of signals needed for the reconstruction, and k - the 
sparsity level. Fig. |2] considers the influence of N where the 
sparsity level is set to k = 4. For each value of N from 150 to 
2500 the error presented in the upper graph is an average over 
20 simulations of the OBD-BCS algorithm. In each simulation 
the sparse vectors and the orthogonal matrix where generated 
independently, but the measurement matrix was not changed. 
The error of each signal was calculated according to ©. 

For comparison, the lower graph in Fig. |2] is the average 
error of a standard CS algorithm that was performed on the 
same data, and used the real basis P, which is unknown 
in practice. The CS algorithm we used was again OMR As 
expected, the results of the CS algorithm are independent 
of the number of signals, since it is performed separately 
and independently on each signal. The average error of this 
algorithm is 0.08%. The reason for this nonzero error, although 



Error Vs. Number of Signals 



- OBD-BCS 

- CS with the real P 



500 



1000 1500 
N 



2000 



2500 



Fig. 2. Reconstruction error as a function of the number of signals, for 
sparsity level of k = 4. 



P is known, is that for a small portion of the signals the OMP 
algorithm fails. 

It is clear from Fig. [2] that for N > 500 the reconstruction 
results of the proposed algorithm are successful and similar 
to those obtained when P is known. Similarly to the con- 
clusion in flTl . the reconstruction is successful even for n 
much smaller then the number needed in order to satisfy the 
sufficient richness conditions, which is (™)(fc + 1) ~ 3 • 10 6 . 
As in most DL algorithms, the algorithm in ETTl . l22l was 
evaluated by counting the number of columns of the dictionary 
that are detected correctly. The conclusions of lETl . l22l are 
that their algorithm can find about 80% of the columns when 
the number of signals is at least 20n = 640, and can find all the 
columns when the number of signals is at least 50n = 1600. 
Using the same measurement matrix dimensions as in ETTl . 
l22l . the minimal number of signals the OBD-BCS algorithm 
requires is only 500. 

In order to examine the influence of k we performed the 
same experiment as before but for different values of k < 10. 
The results are presented in Fig. [3] It can be seen that for 
all values of k the graph has the same basic shape: the error 
decreases with N until a critical N, after which the error is 
almost constant. As k grows this critical N increases and so 
does the value of the constant error. The graphs for k = 1, 
k = 2, k = 3 follow the same pattern; they are not in the 
figure since they are not visible on the same scale as the rest. 

Next we investigated the influence of noise on the algorithm. 
In this simulation the noisy measurements B were calculated 
as B = APS + W, where the elements of W were white 
Gaussian noise. For each noise level 20 simulations were per- 
formed and the average error was calculated. In all simulations 
k = 4 and N = 800. Table [V] summarizes the results of the 
OBD-BCS algorithm and those of OMP algorithm which uses 
the real P. It is clear from the table that in the noiseless case 
the error of both algorithms is similar, therefore in this case 
the prior knowledge of the basis P can be avoided. As the 
SNR decreases both error increase, but the error of OBD-BCS 
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Error Vs. Number of Signals 




500 1000 1500 2000 2500 
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Fig. 3. Reconstruction error as a function of the number of signals for 
different values of k. 

TABLE V 

RECONSTRUCTION ERROR FOR DIFFERENT NOISE LEVELS 



SNR 


CS 


OBD-BCS 


oo 


0.008% 


0.008% 


35dB 


0.82% 


0.88% 


30dB 


1.54% 


1.64% 


25dB 


2.95% 


3.23% 


20dB 


5.81% 


6.10% 


15dB 


12.03% 


12.58% 


lOdB 


25.11% 


26.04% 



algorithm increases a bit faster then that of the CS algorithm. 
However, the difference is not very big. 

VII. Comparative Simulation 

The following simulation illustrates the difference between 
the three BCS methods presented in this work. In this simu- 
lation the length of the signals was m = 128, the sparsity 
level was k = 6, the number of signals was N = 2000, 
and the compression ratio was L = 2. The syntectic data 
was generated as in Section IVI-CI but this time the instead 
of generating P £ jji28xi28 ra ndomly we used 

"1-1 
1 1 



1 1 

which can be viewed as an orthogonal 4-block diagonal matrix 
(each block is 16-block diagonal by itself). 

We used five different methods for the reconstruction of 
these signals. 

1) CS algorithm with the real basis P. 

2) CS algorithm with an estimated basis Pdl- 

3) The F-BCS method. 

4) The direct method for sparse BCS. 

5) The OBD-BCS algorithm. 



TABLE VI 

DL ALGORITHM FOR ORTHOGONAL DICTIONARY 



Inputs 

• X - training set 

• k - sparsity level 
Outputs 

• P - orthogonal dictionary 

• S - sparse matrix 
Algorithm 

• Initiate P = I. 

• Repeat until a stoping criteria is reached: 

o Fix P and calculate S = P T X. 

o Keep only the k highest (absolute value) elements 

in each column of S. 
o Fix S, and calculate the SVD: SX T = UT,V T . 
o Update P = VU T . 



In all the methods above we used OMP as the standard CS 
algorithm. The first method, came as a reference for the rest. 
It used the real basis P, whose knowledge we are trying to 
avoid. The second method is an intuitive way to reconstruct the 
signals. Since the basis P is unknown one can estimate it first 
and then perform a CS algorithm which uses the pre-estimated 
basis. We performed the estimation using a training set of 2000 
signals and a DL algorithm. The estimated basis is denoted 
by Pdl- There are several different DL algorithms, eg. Il20l - 
ll22l . 11241 . l40l . However, in this case we have important prior 
knowledge that the basis P is orthogonal 4-block diagonal. 
One way of using this knowledge is dividing the signals 
X into 4 blocks corresponding to the 4 blocks of P, and 
estimating each block of P from the relevant block of X using 
the algorithm in Table [VTJ which is designed for learning an 
orthogonal basis. 

Due to this structure of P and the sparsity of S in each 
column of X there are up to 12 nonzero elements. Therefore, 
the identity matrix I was one of the bases in the finite set 
$ that we used. Specifically, we used the same set "J? as in 
the simulations in Section [IV] X had about twice as many 
nonzero elements in each column compared to the real sparse 
matrix S, such that X is 2fc-sparse under I. Therefore, we 
ran the F-BCS method with sparsity level of 2k instead of k. 
Moreover, since P is sparse itself we used $ = I as the base 
dictionary in the sparse BCS method. It is easy to see that 
kp — - 2. 

Table IVHI reports the average error of all five methods, 
calculated as in ([8j. As can be seen, the results of F-BCS are 
much worse than all the others. This can be expected since 
in this case X is 2fc-sparse, so that the OMP reconstruction 
is not as good. The error of the sparse BCS is also higher 
then the rest. The reason for this is that in order for the direct 
method of sparse BCS to work well the product k p k should 
be small relative to n. In this case this product is not small 
enough. Note that though higher from the rest the errors of 
the sparse BCS and F-BCS are quite small. We performed the 
same simulation with k — 3 and then the error of sparse BCS 
was reduced to the level of the rest, but the error of F-BCS 
was still high. 

The results of both the OBD-BCS algorithm and the CS with 
the estimated basis, which both did not use the knowledge 
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TABLE VII 

RECONSTRUCTION ERROR OF DIFFERENT RECONSTRUCTION ALGORITHMS 



Algorithm 


EiTor 


CS with the real P 


ltr 




CS with P = Pol 


ltr 


5 % 


F-BCS 


0.522% 


Sparse BCS 


0.084% 


OBD-BCS 




5 % 



of the basis P, are similar to those of the algorithm which 
used this knowledge. Thus, the prior knowledge of P can be 
avoided. The advantage of OBD-BCS over the CS with the 
estimated basis is that it does not require any training set, and 
therefore can be used in applications where there is no access 
to any full signals but only to their measurements. 

VIII. Conclusions 

We presented the problem of BCS which aims to solve CS 
problems without the prior knowledge of the sparsity basis of 
the signals. Therefore, this work renders CS universal not only 
from the measurement process point of view, but also from the 
recovery point of view. 

We presented three different constraints on the sparsity ba- 
sis, that can be added to the BCS problem in order to guarantee 
the uniqueness of the solution to the BCS problem. Under 
each of these constraints we proved uniqueness conditions 
and proposed simple methods to retrieve the solution. All 
the proposed methods perform very well in simulations on 
synthetic data. In fact, when k is small enough and when 
enough signals are measured (only for the structural constraint 
case), the performance of our methods is similar to those of a 
standard CS which uses the real, though unknown in practice, 
sparsity basis. We also demonstrated through simulations the 
advantage of BCS over CS with an estimated sparsity basis. 
The advantage of BCS is that it does not require any training 
set, and therefore can be used in applications where there is 
no access to any full signals but only to their measurements. 

An interesting direction for future research is to examine 
more ways to assure uniqueness, beside the three presented 
here, and weaken the constraint on the basis. 
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Appendix A 

The following proves Lemma [12] That is, if P and P are 
both 2L-block diagonal matrices, A satisfies the conditions of 
Theorem [TT] and Q is a permutation matrix, then AP = APQ 
implies P = PQ. 

We begin this proof by proving that under the lemma's 
conditions Q is necessarily block diagonal, after this is done 
the completion of the proof is straight forward. For any 
D = [D X ,...,D L ] G R"x» L such that D U ...,D L e R" x ™ 
the permutation DQ can yield three types of changes in D. It 
can mix the blocks of D, permute the order of the blocks of 



D, and permute the columns inside each block. Q is L-block 
diagonal if and only if it permutes only the columns inside 
each block, but does not mix the blocks or change their outer 
order. 

First we prove that Q cannot mix the blocks of D. We 
denote by Qb the group of all block permutation matrices, 
which is the group of all the permutation matrices that keep 
all blocks together. That is, if Q 6 Qb then when multiplying 
DQ only the order of the blocks Di,...,D L and the order 
of the columns inside the blocks change, but there is no 
mixture between the blocks. After we prove that Q £ Qb we 
prove that Q also cannot change the outer order of the blocks, 
and therefore must be block diagonal. In order to prove that 
necessarily Q £ Qb, we use the next two lemmas. 

Lemma A.l. If D = [D 1 ,...,D L ] e M™x™ L is a union of 
L orthogonal bases, and o~(D) = n + 1, then any set of n 
orthogonal columns of D are necessarily all from the same 
block of D. 

Proof: Assume T is a set of n orthogonal columns from 
D. Denote r = T\ U T2, where Ti is the set of columns 
taken from D±, and T2 contains the rest of the columns in 
r. Without loss of generality assume the set Ti is not empty. 
Since both D\ and V are orthogonal bases of W l , the span of 
T2 equals the span of the columns of D\ which are not in T. 
Therefore, the set of columns T2 U d, where d is any column 
from D\ which is not in T, is either linearly dependent or 
empty. However, the set T2 U d contains at most n columns, so 
that since a(D) — n+1 this set cannot be linearly dependent. 
Therefore, T2 is necessarily empty, such that all the columns 
of r are from the same block of D. ■ 

Lemma A.2. Assume D = [D\, ...,Dl] <G W lxnL is a union 
of L orthonormal bases, with a(D) = n + 1, and D = DQ 
for some permutation matrix Q. If D is also a union of L 
orthonormal bases, then Q £ Qb- 

Proof: If there was a permutation Q ^ Qb such that D = 
DQ, it would imply that n columns of D, not all from the 
same block, form one of the orthogonal blocks of D. However, 
according to Lemma I A. 1 1 any n orthogonal columns must be 
from the same block, and therefore Q <G Qb ■ B 

We need to prove that the equality AP = APQ implies 
P = PQ. Denote the orthogonal blocks of A by Ai for i = 
1 , . . . , L and the orthogonal blocks of P and P by P J and P' J 
respectively for j = 1, 2L. Also denote: 



D = AP = 
D = AP = 



A x 



P L 



p2L-l 

p2 I ' ■••) Ai, I p2L 



pi \ / p2L-l 

p2 ),->A L l p2L 



which are both unions of L orthogonal bases since Ai, P J and 
P' J are all orthogonal. Therefore, according to Lemma IA.2I 
Qe Qb- 

Next we prove that Q also cannot change the outer order of 
the blocks, and therefore must be L-block diagonal. Assume 
by contradictions that Q changes the outer order of the blocks 
of D. Without loss of generality we can assume this change 
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is a switch between the first two blocks of D. That is, 

p3 



Di = D 2 Q 2 = A 2 
D 2 = D 1 Q 1 =A 1 



P 1 



pi 



P 2 



Q2 

Qx 



where Qi,Q 2 are the corresponding sub-matrices of Q which 
permute the columns inside the blocks Di,D 2 . In order to 
satisfy D = AP we must have 

P 3 



D 1 = A 1 
t> 2 = A 2 



P 1 



p3 



P 2 



P 4 



A, 



= A l 



P 1 



P 1 



P 2 



Qi 



Since Ai and A 2 are orthogonal the above implies 



P 1 



p3 



P 2 



P 4 



= A\A 2 



A 2 Ai 



p3 



P 1 



P J 



P 2 



(A-l) 



If there is an orthogonal 2L-block diagonal matrix P that 
satisfies ( IA-11 ). then in contradiction to Lemma Q~2] P ^ PQ. 
However, ( IA-U implies: 



A\A 2 = 



P 1 



P 2 



p3- 




' Ri 


R 2 ' 


p4- 






R4 



Due to the structure of the permutation matrix Q 2 and due 
to the orthogonality of the blocks of P and P, the ranks of 

Pi , R 2 , R3 , R4 must satisfy: 

rank(Pi) = rank(P4) 

ft 

rank(P2) = rank(i?3) = — — rank(Pi). 

Therefore, A is necessarily inter block diagonal. However, 
according to the conditions of TheoremfTTM is not inter block 
diagonal, so that the contradictions assumption is incorrect and 
Q cannot change the outer order of the blocks, such that Q 
must be i-block diagonal. 

Denote the diagonal blocks of Q by Qi for i = 1, ...,L, 
such that: 



D 



A! 



A! 

P 1 



P 1 



P 2 



;A! 



2L-1 



P 



I Ql, -;A L 



P 



>2i-l 



P 



2L 



P 



2L 



Since all Ai are orthogonal the above implies that for all i 
l,...,L 



P 2 



P 2 



P 



2 1 



such that P = PQ. U 
In fact the above proves not only that Q is L-block diagonal, 
it is also 2P-block diagonal. Note that the extension of this 
proof to the case where P and P have ML blocks, for M > 2, 
is trivial. However, if P and P had L blocks instead of 2L, 
this proof would not work. That is since in this proof in order 



to eliminate solutions of the form of (1A-U we use the 2- 
block diagonal structure of the matrices. If there were only L 
blocks, then beside the solution P = PQ there would have 
been another possibility, which is: 

A\A 2 P 2 Q 2 

A 2 r A l P l Q l 

P3Q3 

PlQl 

where Pi, ...Pl are the L blocks of P and Q±, ...Ql the the 
corresponding blocks of Q. Obviously in this case P 7^ PQ. 

Appendix B 

The following proves that if A = [A u A L ] e W xnL is 
a union of L orthogonal bases, where each block is generated 
randomly from an i.i.d Gaussian distribution followed by a 
Gram-Schmidt process, then with probability 1 a(A) = n + 1 
and A is not inter-block diagonal (Definition ITOb . Multipli- 
cation by an orthogonal P does not change the statistics, 
therefore if a (A) = n + 1 with probability 1, then also 
a(AP) = n + 1 with probability 1. Therefore, such an A 
satisfies the conditions of Theorem QT| with probability 1 . 

We begin the proof by noting that we can look at the 
generation of each block of A as follows. The first column 
ai is generated randomly from 1™. The second column a 2 
is generated randomly from the n — 1 dimensional space 
orthogonal to a\. the column 03 is generated randomly from 
the n — 2 dimensional space orthogonal to the span of 
{ai, a 2 }, and similarly any is generated randomly from the 
space orthogonal to the span of all previous columns, whose 
dimension is n — i + 1. We start by proving a(A) = n + 1. 
This proof uses the next lemma. 

Lemma B.3. Assume G € R" x " is generated as an i.i.d 
Gaussian matrix followed by a Gram-Schmidt process, and 
U is a given space of dimension d. If d < n then with 
probability 1 non of the columns of G are in U. 

Proof: Denote the columns of G by gi for i = 1, n. Since 
d < n the space U has zero volume in R". g\ is generated 
randomly from R", and therefore with probability 1 g\ is not 
in U . For any other 1 < i < n, gi is generated randomly from 
Gi, which is the space orthogonal to the i— 1 previous columns 
in G. Gi dimension is di = n — i + 1. In this case we need to 
look at the probability to generate gi in the intersection UDGi. 
If d < di then obviously this intersection has zero volume in 
Gi, so that gi is not in U with probability 1. Furthermore, if 
d > di then due to the randomness of the columns of G, Gi 
is not entirely contained in U with probability 1. Therefore, 
here too U P\Gi has zero volume in Gi, such that gi is not in 
U with probability 1. ■ 

Assume T is a set of a (A) linearly dependent columns from 
A. Denote T — Ti U T 2 , where Ti is the subset of T which 
contains only the columns taken from the block A\, and T 2 
are the rest of the columns in T. Without loss of generality 
assume T\ is not empty. Moreover, since A\ is orthogonal 
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Ti is also orthogonal, such that in order for F to be linearly 
dependent T2 also cannot be empty. 

Any n + 1 columns from A are linearly dependent such that 
a (A) < n + 1. Therefore, |r| < n + 1 so that |r 2 | < n. 
If |ri| = n or |1?2| = n then necessarily a(A) = \T\=n + l. 
Assume by contradiction that <r(A) = \T\ < n, such that 
I r 1 1 <?iand|r2| < n—|ri|. If |Ti I contains only one column, 
denoted by 71, then since T is linearly dependent 71 must be in 
the span of IV However, the dimension of this span is at most 
I r2 1 < n— 1, such that according to Lemma lBJl the probability 
for this is zero. If Ti contains only two columns, denoted by 
71, 72, then 72 must be in the span of T% U 71. However, the 
dimension of this space is at most |]?2| + 1 < n — 1, such that 
according to Lemma 1531 the probability for this is again zero. 
We can keep increasing the cardinality of T\ and as long as 
|T| < n the probability for V to be linearly dependent will be 
zero. Therefore, the contradiction assumption is incorrect with 
probability 1, so that <r(A) = \T\ = n + 1 with probability 1. 

Next we need to prove that A is not inter-block diagonal. 
Denote for any pair of indices i 7^ j: 



Aj Aq 



Ri 
R 3 



R 2 
R4 



(B-2) 



For A to be inter block diagonal there should be a pair i 7^ j 
for which: 



rank(i?i) = rank(i?4) 
rank(i?2) = rank(i?3) 



n 
2 



rank(i?i 



(B-3) 



However, due to the randomness of A; , Aj the blocks 
Ri , i?2 , R3 , R4 all have full rank with probability 1 . So that 
rank(i?i) = rank(i? 2 ) = f and rank(i? 2 ) ^ § - rank(i?i). 
Therefore, A is not inter block diagonal with probability 1 . 

Appendix C 

Assume A € I? xm is a union of L random orthogonal 
bases and P <G R mxm is an orthogonal 2L-block diagonal 
matrix. Denote D = APQ where Q is some unknown signed 
permutation matrix. We prove here that there are [(^;)!] 2i 
different permutation matrices Qo such that DQo = AP, 
where P is an orthogonal 2L-block diagonal matrix. Without 
loss of generality we can assume Q = I, therefore we need to 
refer to APQ n = AP. According to Lemma Q~2] this implies 
PQd = P- Since both P and P are 2L-block diagonal 
Qo must be too, and the size of its blocks is ^ x ^77 
Qe> is a permutation matrix, therefore each of its blocks is 
a permutation of the identity matrix of size Thus, there 
are only (57^)! different possibilities for each block. There 
are 2L blocks such that the total number of possible Qd's is 

[(2T) ! ] 2L - 
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